Sequencing and Raw Sequence Data Quality Control ◾ 47
7. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM: The Sanger FASTQ file format for sequences
with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2010,
38(6):1767–1771.
8. FASTQ Files [https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/
Informatics/BS/FASTQFiles_Intro_swBS.htm]
9. Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic Acids Res 2011,
39(Database issue):D19–21.
10. Andrews S: FastQC: A Quality Control Tool for High Throughput Sequence Data. Babraham
Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
11. Chen Y-C, Liu T, Yu C-H, Chiang T-Y, Hwang C-C: Effects of GC bias in next-generation-
sequencing data on de novo genome assembly. PLOS One 2013, 8(4):e62856.
12. Lightfield J, Fram NR, Ely B: Across bacterial phyla, distantly-related genomes with similar
genomic GC content have similar patterns of amino acid usage. PLoS One 2011, 6(3):e17677.
13. Romiguier J, Ranwez V, Douzery EJ, Galtier N: Contrasting GC-content dynamics across 33
mammalian genomes: relationship with life-history traits and chromosome sizes. Genome
Res 2010, 20(8):1001–1009.
14. FASTX-toolkit [http://hannonlab.cshl.edu/fastx_toolkit/]
15. Bolger AM, Lohse M, Usadel B: Trimmomatic: A flexible trimmer for Illumina sequence data.
Bioinformatics 2014, 30(15):2114–2120.
16. Chen S, Zhou Y, Chen Y, Gu J: fastp: An ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics 2018, 34(17):i884–i890.